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Abstract —The Football World Cup as world’s favorite sporting 
event is a source of both entertainment and overwhelming 
amount of data about the games played. In this paper we analyse 
the available data on football world championships since 1930 
until today. Our goal is to rank the national teams based on all 
matches during the championships. For this purpose, we apply 
the PageRank with restarts algorithm to a graph built from the 
games played during the tournaments. Several statistics such as 
matches won and goals scored are combined in different metrics 
that assign weights to the links in the graph. Finally, our results 
indicate that the Random walk approach with the use of right 
metrics can indeed produce relevant rankings comparable to the 
FIFA official all-time ranking board. 

I. Introduction 

Football, being the world’s most favored sport, draws peo¬ 
ple’s attention in every field, from the simple means of en¬ 
tertainment to more complex objectives of statistics, research 
and data analysis. Since the FIFA world cup first took place 
in 1930 until this day, there have been around 20 tournaments 
held, each comprising of about 64 matches, not counting the 
qualification rounds Q, Therefore, there is significant 
amount of data that one could inspect, analyse and draw 
conclusions from. 

Having that in mind researchers are tackling problems 
regarding playing strategy, ranking of teams or performance 
analysis from different aspects including economic, demo¬ 
graphic, cultural and climatic factors |[3}. A team’s game 
strategy for example can be observed from graph theory 
perspective by constructing a network of passes between 
players. In this context different centrality measures can be 
used to determine the importance of particular players 0-@- 
Other subject of interest might be modelling football matches 
in terms of scores during the game. For example, in |7) the 
authors discuss a statistical model for scoring times in a match. 

Here we address the problem of ranking national football 
teams. Our main task is to use the available statistics, in 
order to come up with an alternative ranking method for the 
football teams based on their achievements at the world cups. 
There are different rating methods currently in use and they 
produce relevant results. FIFA have their own 4-year points 
based FIFA/Coca-Cola rating system [j8) and world cup all- 
time ratings |9J that includes all championships since their 
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origin. There are also the World Football Elo Ratings based 
on the rating system FIDE uses to rate chess players G3- 

A good ranking method should not only take into account 
how many times a team has won, but also consider how strong 
an opponent they have defeated. Victory against stronger 
opponent is preferable and thus more significant than victory 
against weaker opponent. One method that incorporates such 
logic is the PageRank (Random walk) method, which is 
applicable to vast varieties of network based problems that 
require ranking in some way. Other than the well known 
problem of rating web-pages 0 ) it is also utilized in social 
network analysis, in tasks such as link prediction, information 
diffusion and communities detection m-m- Also it is used 
in NLP for the purpose of text summarization and word sense 
disambiguation (HMD- For previous attempts of employing 
PageRank mechanism in sporting events we refer the reader 

to (m-oi. 

The rest of the paper is organized as follows. In Section [H] 
we present the ranking problem and the PageRank based 
method for solving it. We also give description and statistics 
of the data that was available to us. The obtained results are 
presented in Section [HI] including a discussion and comparison 
to the official rankings and then we conclude the paper in 
Section [IV] 

II. Materials and Methods 

A. Data 

The data we used was obtained from llvll, web-site for 
football statistics that contains all time figures about the 
matches of the world cup, qualification games inclusive [20) . 
For each national team there is information on which country 
they have played against, the number of matches won, drawn 
and lost, as well as the number of scored and conceded goals 
during all match-ups. Throughout this paper we use the term 
match-up in context of a single game played between two 
teams. And a match-up pair are every two teams that have 
played against each other. The dataset, contains 210 countries 
and statistics on 2335 match-up pairs that have played against 
one another, or 7141 games in total, during which 20298 goals 
were scored. The average number of games per match-up pair 
is 3.0582, and the average number of goals scored per match¬ 
up pair is 4.3465. Mexico versus USA is the pair with the 


Table I 

Set of tested weighting function and their score in 

NORMALIZED NUMBER OF INVERSIONS AS SIMILARITY METRICS TO THE 
OFFICIAL RANKINGS. LESS IS BETTER. 


# 

WEIGHTING FUNCTION 

INVERSIONS 

1 

f l i,j 1 

- Si.i G -SL7+ 1 

0.032 

2 
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0.038 

3 

f _ l i,i | 
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0.040 

4 

fi,j = h,j 

0.040 

5 

II 

0.041 

6 

d ’7 

II 

< 

0.043 

7 

fid 

0.044 

8 

f ■ = Ci 'i 

0.044 

9 

II 

0.046 

10 

fid = dd 

0.050 


largest number of games played against one another. About 
28 games were played during which around 100 goals were 
scored, 15 of which were won by the US, 6 were drawn and the 
other 7 resulted in a victory for Mexico. The country with the 
most games played is Brazil with about 200 matches and also 
is the country with most games won and most goals scored as 
expected. 

B. Method 


random node [211. The damping factor other than being nec¬ 
essary as assurance that the random walk would converge to a 
stationary distribution, it is also intuitive. The intuition behind 
the use of damping factor within our match-ups network is the 
following: although the graph is dense not every team have 
played against every other. So when using weighting metrics 
such as the loss ratio (funcion 1 in Table [I} the damping factor 
would mean adding some wining chances to all the teams 
that have never been played against. It also adds some wining 
chances to a team that has never won a game within a match¬ 


up. 

The PageRank is calculated using the power method ]22) . 
This method is an iterative algorithm (eq. [2| that finds the 
dominant eigenvector, which corresponds to the invariant 
distribution of the time a random walker spends at a certain 
node - the PageRank. By normalizing the adjacency matrix A 
we get the transition probability matrix Q with elements as 
given in eq. |T] 


= + ± (l) 

J2 Ai,k 

7T T = 7 t t Q. (2) 

Note that Q is guaranteed to be irreducible and aperiodic 
as a consequence of the nonzero damping factor d. 


The ranking method explored throughout this paper is the 
PageRank with restarts algorithm applied to a graph build 
around the supplied data ED- Each national team is a single 
node in the graph and two nodes are linked if the two teams 
(the match-up pair) have ever competed against each other in 
a world cup tournament. The weight of the link is determined 
by a weighting function that involves one or more metrics 
such as number of games played between a match-up pair, 
the number of won, lost and drawn games, or the number of 
scored and conceded goals. The various weighting functions 
we have tested are given in Table [I] 

Within the functions we use the following notation: 

fi j weight of the link from node i to node j; 

(jij number of games played between the two teams; 

li j number of games lost by team i amongst all the 

games i and j played; 

Wij number of games won by team i amongst all the 
games i and j played; 

Ctj number of goals conceded by team i during all the 
games i and j played; 

Sjj number of goals scored by team i during all the 
games i and j played; 

dij number of games drawn between the two teams; 

G maximum number of games played between any 
match-up pair; 

Another factor that affects the PageRank is the damping 
factor. The damping factor corresponds to the probability that 
a random walker would discontinue the walk and jump to a 


Table II 

Number of games played and results for each match-up pair 


PAIR 

GAMES 

RESULTS 

A-B 

3 

A wins 2, B wins 1 

A-C 

3 

A wins 2, C wins 1 

A-D 

3 

A wins 3, D wins 0 

B-C 

3 

C wins 3, B wins 0 

B-D 

3 

D wins 3, B wins 0 

C-D 

3 

C wins 1, D wins 2 


Table III 

The PageRank of each team in descending order 


TEAM 

GAMES 

WIN 

PAGERANK 

A 

9 

7 

0.333 

c 

9 

5 

0.281 

D 

9 

5 

0.211 

B 

9 

1 

0.175 


C. Example 

For the sake of demonstration, let’s consider a toy example 
that illustrates our goal. Suppose there are 4 teams and the 
given statistics for each pair are shown in Table [II] The graph 
(Fig. 0 is built using loss ratio as metric (function 2 at Table[l]). 
Therefore the weight of a given link from i to j is the part 
of the games that i has lost to j. For instance there is a link 
from A to C with weight of | and also a link from C to A 
with weight of |. That means out of 3 matches A and C have 
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Figure 1. Graph representation og the games played, the size of each node 
is proportional to it’s PageRank 


played against each other A has won 2 matches, C has won 
1 and no matches were drawn. The next step is calculation of 
the PageRank. Therefore we need transition probability matrix 
which is calculated according to eq.|T|with a common damping 
factor value of 0.15. 

Finally the results are shown at Table m A is pointed as 
highest ranked and B is lowest ranked team as expected. On 
the other hand, team C and team D both have won 5 games 
as shown in Table III However, PageRank takes into account 
the strength of the defeated opponent not only the number 
of winnings. As a result, team C is ranked higher since they 
have won a game against A, considered as strong opponent, 
in contrast to team D who have winnings only against weaker 
opponents. 


III. Results and Discussion 


In order to find the most precise ranking several differ¬ 
ent weighting functions have been tried and almost all of 
them delivered similar results. The results were evaluated by 
comparing the PageRank to the official world cup ranking. 
We have used normalized number of inversions as evaluation 
metric |23) , taking the official FIFA all-time rankings as 
referent ordering. The tested weighting functions and their 
scores are listed at Table Q] Lower score means the results 
generated using the corresponding metric are more similar 
to the official ranking. We only used the top 30 highest 
ranked teams in the comparison because we wanted to give 
them higher priority and get their ordering right at the cost 
of misplacing some of the lower rated teams. The error of 
the weighting functions also depends on the damping factor. 
The minimum is achieved when the damping factor value is 
very small, around 0.05. That is the value we used in the 
evaluations of the metrics shown in Table E Kg0 shows 
errors (in normalized inversions count) for the top 5 metrics 
as functions of the damping factor. As expected the error 
increases with the growth of the damping factor. Table IV 


Table IV 

TOP 20 HIGHEST RANKED NATIONAL TEAMS USING COMBINATION OF 
LOSS RATIO AND NUMBER OF GAMES THE TWO TEAMS PLAYED AS 
WEIGHTING FUNCTION (FUNCTION 1 IN TABLeJT} AND 0.05 DAMPING 
FACTOR. THE 4-TH COLUMN GIVES THEIR POSITION IN THE OFFICIAL 
RANKING 


# 

COUNTRY 

PAGERANK 

OFFICIAL 

1 

Brazil 

0.040375 

1 

2 

Italy 

0.037992 

3 

3 

Germany 

0.033801 

2 

4 

Netherlands 

0.031052 

8 

5 

Argentina 

0.029159 

4 

6 

England 

0.029100 

6 

7 

Spain 

0.027904 

5 

8 

France 

0.025670 

7 

9 

Czechoslovakia 

0.025155 

NA 

10 

Sweden 

0.022882 

10 

11 

Mexico 

0.022034 

13 

12 

Hungary 

0.022014 

16 

13 

Uruguay 

0.020660 

9 

14 

Belgium 

0.020255 

14 

15 

Portugal 

0.020211 

17 

16 

Poland 

0.019528 

15 

17 

Denmark 

0.019206 

25 

18 

Croatia 

0.018993 

27 

19 

Switzerland 

0.016650 

21 

20 

Yugoslavia 

0.016466 

NA 
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Figure 2. The graph of the match-ups built with the combination of loss ratio 
and number of games the two teams played as weighting function (function 
1 in Table [Tl. The size of each node corresponds to their PageRank (damping 
factor of 01)5 used). For the sake of clarity only the strongest links coming 
out of each node are shown. 


shows the top 20 teams (for brevity), according to our best 
weighting function. The 4-th column contains the positions 
for each team at the official rankings board. The position is 
marked green if the team holds the same place in both ours and 
the official rankings. The position is marked with red if there 





























is a large displacement (Denmark and Croatia). If a team is not 
found in the official ranking (Czechoslovakia and Yugoslavia 
in our case) their position is marked with NA. Fig [2] shows the 
match-ups graph. Each team is a node in the graph represented 
by their national flag and the size of each node is proportional 
to it’s PageRank. In the figure a portion of the links are omitted 
for the sake of clarity, thus the real graph is much denser than 
it appears. 

Possible issue when using PageRank as ranking method 
might be the following: A node can obtain a high PageRank 
score if it has a high ranked neighbour from which it can 
receive significant amount of votes or if it has many low 
ranked neighbours. In our example, if a national team is high 
ranked then they must have either defeated many low ranked 
teams or achieved remarkable results against a highly ranked 
opponent. This property of the Random Walk affects our 
results especially since we treat all matches equally, without 
taking into account whether it is qualification round or final 
game. As a result there might be teams that have received high 
ranking only because they have played and won against many 
low ranked opponents in less significant qualification matches. 



Figure 3. The error in normalized number of inversions of the first 5 
weighting functions in Table [T] as function of the damping factor 


IV. Conclusion 

Throughout this paper we explored the PageRank method 
for ranking national football teams. Our results showed that 
even with simple weighting functions such as ratio of the 
goals scored or matches won, the PageRank algorithm derives 
promising results. The rankings this method produced are 
similar to the official FIFA all-time rankings. However, it is 
difficult to evaluate whether the PageRank with use of more 
sophisticated weighting function and more features within the 
dataset could lead to a better rating scheme than the official. 
Anyway, under the assumption that the FIFA ranking system is 
proper and accurate, RandomWalk despite the simple dataset 
and weighting metrics can replicate it’s results in a great deal. 
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